A spectral clustering approach to speaker diarization
نویسندگان
چکیده
In this paper, we present a spectral clustering approach to explore the possibility of discovering structure from audio data. To apply the Ng-Jordan-Weiss (NJW) spectral clustering algorithm to speaker diarization, we propose some domain specific solutions to the open issues of this algorithm: choice of metric; selection of scaling parameter; estimation of the number of clusters. Then, a postprocessing step – “Cross EM refinement” – is conducted to further improve the performance of spectral learning. In experiments, this approach has performance very similar to the traditional hierarchical clustering on the audio data of Japanese Parliament Panel Discussions, but it runs much faster than the latter.
منابع مشابه
On the Use of Spectral and Iterative Methods for Speaker Diarization
This paper extends upon our previous work using i-vectors for speaker diarization. We examine the effectiveness of spectral clustering as an alternative to our previous approach using Kmeans clustering and adapt a previously-used heuristic to estimate the number of speakers. Additionally, we consider an iterative optimization scheme and experiment with its ability to improve both cluster assign...
متن کاملConfidence for Speaker Diarization using PCA Spectral Ratio
Confidence scoring is an important component in speaker diarization systems, both for offline speech analytics and for online diarization that are required to produce the speaker segmentation from very little audio. This paper proposes a confidence measure for speaker diarization based on the spectral ratio of the eigenvalues of the Principal Component Analysis (PCA) transformation computed on ...
متن کاملOn the use of agglomerative and spectral clustering in speaker diarization of meetings
In this paper, we present a clustering algorithm for speaker diarization based on spectral clustering. State-of-the-art diarization systems are based on agglomerative hierarchical clustering using Bayesian Information Criterion and other statistical metrics among clusters which results in a high computational cost and in a time demanding approach. Our proposal avoids the use of such metrics app...
متن کاملSpeaker diarization using divide-and-conquer
Speaker diarization systems usually consist of two core components: speaker segmentation and speaker clustering. The current state-of-the-art speaker diarization systems usually apply hierarchical agglomerative clustering (HAC) for speaker clustering after segmentation. However, HAC’s quadratic computational complexity with respect to the number of data samples inevitably limits its application...
متن کاملThe Approach of Speaker Diarization by Gaussian Mixture Model (GMM)
Speaker identification is an important activity in the process of speaker diarization. We need to model the speaker by Gaussian mixture model (GMM) for speaker identification purpose. Large GMM is called as a Universal Background Model (UBM) which is adapted into each speaker model for speaker identification purpose. This paper focuses on speech clustering for speaker diarization. The speaker d...
متن کامل